Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

ypes of outliers were analysed for this data set. An outlier at the

f an expression profile of a gene was named as a low extreme

An outlier at the top of an expression profile of a gene was named

extreme outlier.

e 6.13 shows that the gene 221505_at in GDS3139 was classified

I error. It may be a non-DEG because one of the cancer replicates

w extreme outlier. It was this outlier which pulled down the mean

n of the cancer replicates so as to cause the potential mis-

tion of a non-DEG to a false DEG, a type I error. The raw p value

93 and the new p value was 0.0185 after an outlier was removed.

itical p value was 0.01, it can be seen that the differential

n pattern turned around because of this low extreme outlier, i.e.,

scovered as a DEG when the outlier was not removed, but was

ed as a non-DEG when the outlier was removed. A thorough

tion was done for all genes in this data set to examine the

neous differential expression problem. It was assumed that a gene

a single outlier. The investigation had found 33 genes, which

ly the false DEGs (the Type I errors) in this data set (GDS3139).

A low extreme outlier of gene 221505_at in data GDS3139 caused a Type I error.

ure 6.14, a low extreme outlier presented in the cancer replicates.

this low extreme outlier occurred to the cancer replicates, the

pression of the cancer replicates was pulled down significantly.

p value was 0.2096 if the outlier was not removed. However, if

extreme outlier was removed, the new p value was 0.0093.

g this low extreme outlier resulted in a discovered DEG.